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Abstract 

This paper proposes a new algorithm for hnear system identification from noisy measure- 
ments. The proposed algorithm balances a data fidelity term with a norm induced by the set of 
single pole filters. We pose a convex optimization problem that approximately solves the atomic 
norm minimization problem and identifies the unknown system from noisy linear measurements. 
This problem can be solved efficiently with standard, freely available software. We provide rig- 
orous statistical guarantees that explicitly bound the estimation error (in the 'H2-norm) in terms 
of the stability radius, the Hankel singular values of the true system and the number of measure- 
ments. These results in turn yield complexity bounds and asymptotic consistency. We provide 
numerical experiments demonstrating the efficacy of our method for estimating linear systems 
from a variety of linear measurements. 

Keywords System identification. Atomic norms. Hankel operators. Optimization. 

1 Introduction 

Identifying dynamical systems from noisy observation of their input-output behavior is of fun- 
damental importance in systems and control theory. Often times models derived from physical hrst 
principles are not available to the control engineering, and computing a surrogate model from data 
is essential to the design of a control system. System identification from data is thus ubiquitous in 
problem domains ranging from process engineering, dynamic modeling of mechanical and aerospace 
systems, and systems biology. Though there are a myriad of approaches and excellent texts on the 
subject (see, for example jl3j), there is still no universally agreed upon approach for this problem. 
One reason is that quantifying the interplay between system parameters, measurement noise, and 
model mismatch tends to be challenging. 

This paper draws novel connections between contemporary high-dimensional statistics, operator 
theory, and linear systems theory to prove consistent estimators of linear systems from small mea- 
surement sets. In particular, building on recent studies of atomic norms in estimation theory [Hid], 
we propose a penalty function which encourages estimated models to have small McMillan degree. 

A related family of system identification techniques use finite sample Hankel matrices to estimate 
dynamical system models, using either singular value decompositions (e.g, \22\ I15j) or semidefinite 
programming [3 |12l |2T1 [8]. In all of these techniques, no statistical guarantees were given about 
the quality of estimation with finite noisy data, and it was difficult to determine how sensitive 
these methods were to the hidden system parameters or measurement noise. Moreover, since these 
problems were dealing with finite, truncated Hankel matrices, it is never certain if the size of the 
Hankel matrix is sufficient to reveal the true McMillan degree. Moreover, the techniques based 
on semidefinite programming are challenging to scale to very large problems, as their complexity 
grows superlinearly with the number of measurements. 
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In contrast, the atomic norm regularizer proposed in this paper is not only equivalent to the sum 
of the Hankel singular values (the Hankel nuclear norm), but is also well approximated by a finite 
dimensional, ii minimization problem. We show that solving least-squares problems regularized by 
our atomic norm is consistent, and scales gracefully with the stability radius, the McMillan degree of 
the system to be identified, and the number of measurements. Our numerical experiments validate 
these theoretical underpinnings, and show that our method has great promise to provide concrete 
estimates on the hard limits of estimating linear systems. 

1.1 Notation 

We adopt standard notation; D and S will denote respectively the open unit ball and the unit circle 
in the complex plane C . T-L2 and T-Loo will denote the Hardy spaces of functions analytic outside 
D, with the norms 

WfWn, = h rifie'Yde and \\ f \\n^ = snp\ f{z)\ 
Jo zes 

respectively. ^2 ([a, &]) will denote the set of square summable sequences on the integers in [a, b]. 

2 Atomic Decompositions of Transfer Functions 

We restrict our attention to SISO systems in this manuscript, as this will simplify the presentation. 
However, we will describe in the discussion how to extend our techniques to MIMO systems. Sup- 
pose we wish to estimate a SISO, LTI system with transfer function G^{z) from a finite collection 
of measurements y = The set of all transfer functions is an infinite dimensional space, so 

reconstructing from this data is ill-posed. In order to make it well posed, a common regulariza- 
tion approach constructs a penalty function pen(-) that encourages "low-complexity" models and 
solves the optimization problem 

minimizec ||*J'(G) — y\\2 + ^pen(G) . (2-1) 

This formulation uses the parameter to balance between model complexity and fidelity to the 
data. The least-squares cost can be modified to other convex loss functions if knowledge about 
measurement noise is available (as in |2H I16j). though in general it is less clear how to design a 
good penalty function. 

In many applications, we know that the true model can be decomposed as a linear combination of 
very simple building blocks. For instance, sparse vectors can be written as short linear combinations 
of vectors from some discrete dictionary and low-rank matrices can be written as a sum of a few 
rank-one factors. In [3j, Chandraskearan et al. proposed a universal heuristic for constructing 
regularizers based on such prior information. If we assumed that 

r 

G-k = CjOj , for some ai E ^, Cj S C , 

i=l 

where A is an origin-symmetric set of "atoms" normalized to have unit norm and r is relatively 
small, then the appropriate penalty function is the guage function (or the Minkowski functional) 
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induced by the atomic set A: 

\\G\U : = inf {t : G£t conv(^)} = inf I ^ \ca\ : G = ^ L (2.2) 




In , it is shown that minimizing the atomic norm subject to compressed measurements yielded the 
tightest known bounds for recovering many classes of models from linear measurements. Moreover, 
in [Ij , the atomic norm regularizer was studied in the context of denoising problems and was found 
to produce consistent estimates at nearly optimal estimation error rates for many classes of atoms. 

To apply these atomic norm techniques to system identification, we must first determine the 
appropriate set of atoms. For discrete time LTI systems with small McMillan degree, we can always 
decompose any finite dimensional, strictly proper system G{z) as: 



Ci 



1=1 

via a partial fraction expansion. Hence, it makes sense that our set of atoms should be single-pole 
transfer functions. We propose the following atomic set for linear systems 

A=<ipw(z) = : w e . 

[ z — w 

The numerator is normalized so that the Hankel norm of each atom is 1. See the discussion in 
Section [3] for precisely why this normalization is desirable. 

The atomic norm penalty function associated with these atoms is 

l|G(.)IU = mf(2:|cJ : G{z) = Y, '-^l-JfA , (2.3) 

where the summation implies that only a countable number of terms have nonzero coefficients . 
This expression finds the decomposition of G{z) into a linear combination of single pole systems 
such that the li norm, weighted by the norms of the single poles, is as small as possible. 

With this penalty function in hand, we now turn to analyzing its utility. In Section [3j we first 
show that for most systems of interest ||G||^ is a well-defined, bounded quantity. Moreover, we 
will show that the atomic norm is equivalent to the nuclear norm of the Hankel operator associated 
with G. Hence, the models that are preferred by our penalty function will have low-rank Hankel 
operators, and thus low McMillan degrees. 

In Section |4| we turn to computation, demonstrating practical algorithms for approximating 
atomic norm regularization problems for several classes of measurements. We will show that with 
finite data, our atomic norm minimization problem is well-approximated by a finite-dimensional ii 
norm regularization problem. In particular, using specialized algorithms adapted to the solution 
of the LASSO |23j . we can solve atomic norm regularization problems in time competitive with 
respect to techniques that regularize with the nuclear norm and SVD-based subspace identification 
methods. 

Finally, we analyze the statistical performance of atomic norm minimization in Section [5| We 
show that our algorithm is asymptotically consistent over several measurement ensembles of interest. 
We focus on sampling the transfer function on the unit circle and present error bounds in terms 
of the stability radius, Hankel singular values, Tioo norm, and McMillan degree of the system to be 
estimated. 
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3 The Hankel Nuclear Norm and Atomic Norm Minimization 



Let us first show that most LTI systems of interest do indeed have finite atomic norm, and, moreover, 
that the atomic norm is closely connected with the sum of the Hankel singular values. 

3.1 Preliminaries: the Hankel operator 

Recall that the Hankel operator, To, of the transfer function G is defined as the mapping from the 
past to the future under the transfer function G. Given a signal u supported on (— oo,— 1], the 
output under G is given hy g*u where "*" denotes convolution and g is the impulse response of G: 

oo 

G{z) = Y,9kZ-''. 

k=l 

Tg is then simply the projection g*u onto [0, oo). An introduction to Hankel operators in control 
theory can be found in [5l Chapter 4] or [211 Chapter 7]. 

The Hankel norm of G is the operator norm of Fg considered as an operator mapping i2{—oo, — 1] 
to i2[0,oo). The Hankel nuclear norm of G is the nuclear norm (aka the trace norm or Schatten 
1-norm) of To- To be precise, an operator T is in the trace class Si if the trace of (r*T)^/^ is 
finite. This implies first that T is a compact operator and admits a singular value decomposition 

oo 

T{f) = "^CFiiVi, f)Ui . 
i=l 

The sequence ai are called the Hankel singular values of T. Moreover, the Schatten 1-norm of T is 
given by 



Till = trace ((r*r)i/2'j =^ 



1=1 

3.2 The atomic norm is equivalent to the Hankel nuclear norm 

The rank of the Hankel operator determines the McMillan degree of the linear system defined by G. 
Rank minimization is notoriously computationally challenging (see |20| for a discussion), and we 
don't expect to be able to directly penalize the norm of the Hankel operator in implementations. 
Thus, as is common, a reasonable heuristic for minimizing the rank of the Hankel operator would 
be to minimize the sum of the Hankel singular values, i.e., to minimize the Schatten 1-norm of 
the Hankel operator. For rational transfer functions, we can compute the Hankel nuclear norm 
via a balanced realization [23j. On the other hand, while the maximal Hankel singular value 
can be written variationally as an LMI, we are not aware of any such semidefinite programming 
formulations for the Hankel nuclear norm. 

The following theorem provides a path towards minimizing the Hankel nuclear norm, minimizing 
the atomic norm ||G(2:)||^ as a proxy. Indeed, from the view of Banach space theory, the atomic 
norm is equivalent to the Hankel nuclear norm. 

Theorem 3.1 Let G € %2- Then Tq is trace class if and only if there exists a sequence {Afc} G ii 
and a sequence {wk} with E D such that 

°° 1 I |2 

g{z) = Y,X,^-^^ . (3.1) 

1=1 
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Moreover, we have the following chain of inequalities 



|G|U< ||rG||i< ||G|U (3.2) 



where ||G||^ is given by (2.2) 



Proof Outline Theorem 3.1 follows by carefully combining several different results from operator 
theory. Peller first showed that transfer functions with trace class Hankel operators formed a Besov 
space |18j . Peller's argument can be found in his book [19]. The atomic decomposition of such 



operators is due to Coifman and Rochberg [4j. The norm bounds (3.2) were proven by Bonsall and 
Walsh [2j. There they show that the | is the best possible lower bound. They also show that if 
llTglli ^ C'lls'llyil for all g, then C must be at least ^, so the chain of inequalities is nearly optimal. 
A concise presentation of the full argument can be found in [T7j . A modern perspective using the 
theory of reproducing kernels can be found in [25j. 

Theorem |3.1| asserts that a transfer function has a finite atomic norm if and only if the sum of its 
Hankel singular values is finite. In particular, this means that every rational transfer function has a 
finite atomic norm. More importantly, the atomic norm is equivalent to the Hankel nuclear norm. 
Thus if we can approximately solve atomic norm-minimization, we can approximately solve Hankel 
nuclear norm minimization and vice-versa. We now turn to such computational considerations. 

4 Algorithms for atomic norm minimization 

From here on, let us assume that the that we seek to estimate has all of its poles of magnitude 
at most p ( we will call p the stability radius, and treat it as a known parameter). Let Dp denote 
the set of all complex numbers with norm at most p. Note that if has stability radius p then 




\G\\_A ■■= inf < \cy,\ : G{z 



^ Cy,{l - |wp) 



w 



That is, we can restrict our set of atoms to only be those single pole systems with stability radius 
equal to p. For the remainder of this manuscript, we assume that A only consists of such single 
pole systems. 

In what follows, we focus our attention on linear measurement maps. Let Ci : Ti C he a, 
linear functional that serves as a measurement operator for the system G{z). Many maps of interest 
can be phrased as linear functionals of the transfer function, 

1. Samples of the frequency response Ck{G) ■= G(e*^'-') for k = 1, . . . , n. From a control theoretic 
perspective, this measurement operator corresponds to measuring the gain and phase of the 
linear system at different frequencies. 

2. Samples of the impulse response, Ck{G) := gi^, for A; = 1, . . . , n and i^ G [1, oo). 

3. Convolutions of the impulse response with a pseudorandom signal Uk'- C^iG) := YlJLi dj^k-j- 
In all of these cases, we consider the problem 



minimizecg ^ |>Ci(G) - yip + ^||G||^ . (4.1) 



i=l 
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This problem is equivalent to the constrained, semi-infinite programming problem 



minimizea;,G | Y2=i kfc - yfcP + E^eDp l^t. 



subject to Xk = Ck{G) for i = 1, . . . , n 

Eliminating the equality constraint gives yet another equivalent formulation 
minimize^ i ^^^^ \xk - yk\^ + H E^eB, 1^^ 



I 



subject to Xk = Y.weo„'^w^k{^-^^] for i = 1, . . . , n . 



Note that in this final formulation, our decision variable is x, a finite dimensional vector, and c^, the 
coefficients of the atomic decomposition. The infinite dimensional variable G has been eliminated. 



Let us define a norm on based on the formulation (4.2) 




Then we see that problem (4.1) is equivalent to the denoising problem 

minimizea;2l|x - y\\l + ^||a:;||/;(^) . (4.3) 

Note that the first term is simply the squared Euclidean distance between y and x in M". The 
second term is an atomic norm on induced by the linear map of the set of transfer functions 



via the measurement operator C. In order to tractably solve (4.1), we thus only need focus on 
computational schemes for computing or approximating The following proposition asserts 

that we can approximate this finite dimensional atomic norm via a sufficiently fine discretization 
of the unit disk. 

Proposition 4.1 Let Dp*^^ be a finite subset of the unit disc such that for any w €l$p there exists 
a V £ 3^p^ satisfying \w — v\ < e. Define 



\x\\c{A,) = inf < 



E 



I I • '^i 




Then there exists a constant G [0, 1] such that 

Ce\\x\\c{A,) < \\x\\ciA) < \\x\\c{A,)- 

The set Dp^'' is called an e-net for the set Dp. We show in the appendix that when Ck{G) = G(e* 
Ce is at least (1 — ^If^*^ ) )■ Other measurement ensembles can be treated similarly. 



When we replace ||a;||£(_4-) with its discretized counterpart ||a;||£(_4) in (4.3), 



C(A) Willi lis uiscieiizeu cuuiiierpari 
minimizea:|||a; - y\\l + 
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is equivalent to 



where 



minimizec^ II Mc — y II 2 + /i |ct^,| (4-4) 



and j indexes the set Op^^ That is M is an n x |Dp^^| matrix. Problem (4.4) is a weighted £i 
regularization problem with real or complex data depending on specific problem. We call (4.4) 
Discretized Atomic Soft Thresholding (DAST), as coined in p|. 

The DAST problem can be solved very efficiently with a variety of off-the-shelf tools including 
SPARSA [23j, FPC [lO] or even more general purpose packages such as YALMIP [14J or CVX [9j. 



DAST yields an approximate solution to problem (4.1), and, as we will see, yields a statistically 
consistent estimate provided the parameter e is adjusted to meet the desired numerical accuracy. 



5 Statistical Bounds 

Let £i : 7^ I— )• C be a linear functional that serves as a measurement operator for the system H{z). 
In this section, let us suppose that we obtain noisy measurements of the form 

Hi = Ci {H (z)) + LJi i = l,...,n. 

where LOi is a noise sequence consisting of independent, identically distributed random variables. In 
this section, we will specialize our results to the case where C returns samples from the frequency 
response at uniformly spaced frequencies: 

'Zirik 

Ck{H{z)) = H{zk), Zk = e~ , k = I, . . . ,n. 

While the techniques here extend to other measurement ensembles, they will be explored in a longer 
version of the paper. 

Our goal in this section is to prove that solving the DAST optimization problem yields a good 
approximation to the transfer function we are probing. The following theorem provides a precise 
statistical guarantee on the performance of our algorithm. 



Theorem 5.1 Let G^, he a strictly proper transfer function with hounded Hankel nuclear norm. 
Suppose the noise sequence uji is i.i.d. Gaussian with mean zero and variance cj^. Choose 6 £ (0, 1) 

and let c he the optimal solution of {4-4) with 



and set e = -^j^^- Let n^p^ be as in Proposition 



4.1 



= 2a J nlog 



ll/)2 



6(1 -p) 



Set G{z) = ^w^~j~^!r- Then if the set of vectors {C{(fa) G : a G Dp*^^} spans M", we 

have 



- cum. < rs.l±i . .o. (^) ^« . ^ 

with probahility 1 — e~°^"^ . 
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Corollary 5.2 There is a quantity C depending on p and a such that for sufficiently large n 

\\G{z)-G,{z)\\l^ <C||rGj|in-i 
with probability exceeding 1 — e~°^'^\ 

Before we describe how to prove this theorem and its coroUary, let us first unpack the features. 
First of all, the right hand side is a parameter of the number of samples, the Hankel nuclear norm 
of the true system, and the stability radius of the true system. Also, if the McMillan degree of 
Gi,{z) is d, then we can upper bound the Hankel nuclear norm by the product of the McMillan 
degree and the Hankel norm of G*: ||rG^||i < (i||rG^||. Second, note that as n tends to infinity, the 
right hand side tends to zero. In particular, this means that our discretized algorithm is consistent, 
and we can quantify the worst case convergence rate. 



The proof of theorem 5.1 is provided in the appendix. We prove this theorem by first upper 
bounding the 7i2 in terms of the mean square error on the observed samples. We show that this 
empirical mean square error can be upper bounded in terms of the Hankel nuclear norm of 
times the dual atomic norm of the noise sequence oj. We estimate this norm, and in the process 
compute the optimal value of the regularization parameter. Putting these pieces together yields 
our main result. 



6 Numerical Experiments 

In this section we validate the proposed framework via some preliminary numerical experiments 
conducted in MATLAB. In many of the experiments where the solution of convex optimization 
problems was required, the software package CVX [9] was used. Throughout our experiments, the 
discretization of the unit circle was held to approximately 2000 points. 

In the first experiment we consider a stable system G with two poles. We make m = 80 
noisy observations of the frequency response by evaluating the transfer functions G{zj) at regularly 
spaced frequencies Zj = e*^^ on the complex unit circle. The noise is additive i.i.d. zero-mean 
Gaussian with a variance of cr^ = 10~^. We reconstruct G{z) by DAST as proposed in section jij 
Our algorithm recovers a system of degree 6 which achieves an performance error of .0043 and 
T^oo error of .0079. The locations of the true and recovered poles are depicted graphically in Fig. 

m 

In Fig. [2] we consider again the problem of recovering a second order system from noisy frequency 
response measurements. The noise variance is set to cr^ = 10~^. The plot below shows the 
performance of DAST as the number of measurements increases. The error metric used is the I-L2 
norm. 

In Fig. [I we compare our algorithm to a widely used method known as subspace identification 
\13\ Chapter 10]. A second order system, starting from an initial condition of x[0] = is excited 
by a random input u[t] corresponding to an i.i.d. sequence of zero-mean, unit-variance Gaussian 
random variables for m time steps. We record the output y[t] of the system for m time steps. From 
this input-output relationship, we use DAST and subspace identification to attempt to reconstruct 
the unknown system. We plot the estimation error in the T-L2 norm as m is increased from 10 time 
units to 120 time units. As is evident, the performance of DAST is superior to that of subspace 
identification when m is small, i.e. of the order of 10 to 50 measurements. 
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H^: 0.0079, H^: 0.0043 



0.8, 
0.6, 




Real 



Figure 1: In the figure above, the locations marked with a circle represent the locations of 
the poles (in the complex plane) of a second order discrete time LTI system. The locations 
marked with a cross correspond to poles recovered by DAST. 




number of frequency observations 



Figure 2: In the figure above, we plot the 'H2 estimation error between the true system and 
the recovered system as the number of frequency measurements m varies. 
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20 40 60 80 100 120 



Number of measurements 



Figure 3: In the figure above, we compare 7^2 estimation error of tlie algorithm proposed in 
this paper (DAST) and the error obtained by the subspace identification method. 



Another aspect that we emphasize is that in these experiments, subspace identification was 
assisted with the knowledge of the true system order. If the wrong model order was used, the 
performance of subspace identification worsened noticeably. By contrast, DAST does not need 
knowledge of the true system order. 

7 Conclusion 

By using the atomic norm framework of [3j , we were able to posit a reasonable regularizer for linear 
systems, understand the computational demands of such a regularizer, and analyze its statistical 
performance. Since it is closely connected to the Hankel nuclear norm but is computationally more 
practical, we believe that our atomic norm will be useful in a variety of practical implementations 
and also in theoretical analysis. However, there are still several outstanding questions to address 
before we fully understand the potential of this norm. We list several of these open problems here. 

Other measurement ensembles Our analysis in Section [5] focused on the particular case of 
sampling the frequency response at regular intervals. By focusing on this example, we were able 
illustrate the critical ingredients to computing convergence rates. First, we needed to show that 
our measurement error provided a reasonable upper bound on the distance to the true transfer 
function. Second, we used convex analysis to upper bound the measurement error in terms of the 
statistics of the noise process. Third, we estimated the noise statistics using probabilistic techniques 
and appealing to the structure of the atomic set and its e-nets. This methodology can be extended 
to the other sampling methods described in Section [4j and may also be extendable to estimating 
transfer functions from pairs of input-output time series. 



Fast Rates and minimcLx optimality The rates provided by Theorem 5.1 demonstrate that 



the DAST algorithm is asymptotically consistent. However, we believe the upper bound we have 
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derived is quite crude. In particular, as discussed in [T], it may very well be possible to improve 
our upper bounds by leveraging more of the geometry of the set of single-pole transfer functions. 
It would be interesting to find reasonable lower-bounds on the reconstruction error from limited 
measurements, and to see how close we can match these worst-case estimates via a new analysis. 



Interpolating with derivatives While gridding the space of poles enables us to quickly solve 
atomic norm problems, a main drawback is that we are then can never exactly localize the true poles 
of the system without an extremely fine grid. One recent proposal to enable such a localization uses 
a linearization technique to simultaneously fit a model on the grid points and at the derivatives of 
the transfer functions at these grid points [Hj. It would be of interest to see if such an adjoining 
method could work in this setting, and future experiments will evaluate the improvements on system 
identification in theory and in practice. 



Extension to MIMO Systems While we focused on the single-input single-output (SISO) 
case in this paper, we expect that these techniques would extend to the multi-input multi-output 
(MIMO) case. One simple extension to note is that if the number of inputs and outputs in the 
system remain small, the SISO techniques presented herein could be applied to each input-output 
pair. Alternative approaches that avoid this pairwise identification would be important in large- 
scale systems with many inputs and outputs, this is an important direction for future research. 

To extend our methodology to MIMO systems, we need to find an appropriate set of atoms. 
One possibility is the set 

[ z — w 

where p is the number of inputs and r is the number of outputs. Any discrete time, MIMO system 
with McMillan degree d can be written in terms of d of these atoms. The only difficulty remains 
computing or approximating the norm ||u||£(-_4) for finite measurements as described in Section [4l 
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A Useful Lemmas 

Before we proceed with the proofs of our main results, let us record a few useful lemmas. Recall 
that our atomic functions are defined as 



1-la 



z — a 



Lemma A.l For any a e D, \\'Pa{z)\\'Hoa ^ 2. 
Proof For z = exp(i^). 



i^i<|izH!|<l + U|<2. 



l-|a| 



Lemma A. 2 For any a G Dp and Zk = ex.p{i9k) we have 



(A.1) 



Proof 



1-laP l-|a|' 



zi — a 



<(i-H') 



Z2 - a 

Zl - Z2 



{zi - a){z2 - a) 
1 + P 



^ (l-|aP) , 1^ 

-(i^H?l"^-"^'- 



1-p 



\ei - 92 



Lemma A. 3 For any a,b E Bp, 



|r^„-r^J|i<3^|a-6|. (A.2) 
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Proof The Hankel operator for (pa{z) is given by the semi-infinite, rank one matrix 

T 



1 a 

a c? c? 
c? 



(1 



1 




1 


a 




a 








a" 




a" 


a' 




a3 



Let Ca = \/r 

that we have 



Then we have 



[1 



iCaXb) 



. Note that Ca S £2 with norm equal to 1. Also note 

(A.3) 



1 - ab 



fa 



"ifbh - WCaCa - CbCb 111 

= \\Ca{Ca-Cbf + iCa-Cb)Cl\\l 

< IICa(Ca-C6f ||l + ||(Ca-C6)Cr 
= 2||Ca - Cb\\i2 



< 



2p 



l-p 



a-b\ 



1 - ab 



(A.4a) 
(A.4b) 
(A.4c) 

(A.4d) 



Here, (A.4b) is the triangle inequality. (A.4c) follows because the nuclear norm of a rank one 



operator is equal to the product of the £2 norm of the factors. (A.4d) follows from (A.3). The final 



inequality follows from analyzing the taylor series of the preceding expression. 



B Dual norms 

We record here a few basic properties about dual atomic norms that we need for our proofs (see [Hill] 
for more details). For an atomic set A, the dual norm is given by 

||z||^ = sup(a, z) . 

Note that for this norm, we have the generalization of Holder's inequality {x,z) < ||2;||^||-z||^- 
Moreover, note that we have the chain of inequalities 

a||2;||yl' < ll^^ll^ < /3||x||_4/ 
for some q < 1 and /3 > 1 for all x if and only if 

P~^\\z\\*A' < Ikll^ < ""^Ikll^' 

for all z. 
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C Proof of Proposition |4.1 



First note that for any atomic sets A C A' , H^;!!^' < The harder part of this proposition is 

the fower bound. To proceed, we use the dual norm. 

(t) ^ (t) 

Let Dp be a subset of Dp such that for every a G Dp, there exists an a G Dp satisfying 

(t) 

\a — a\ < T. For each a G Dp, we will actually denote a as the closest point in Dp to a. 
Now observe that 

8 16pT 

Here, the first inequality follows from our reasoning in Section |4j The second inequality is Theo- 
rem 



3.1, and the final inequality is by (A. 2). 



We then can compute 

= sup {C{ipa),z) 



= sup {C(ipa), z) + {C{(pa - (Pa),z) 
aeDp 

< sup {C{ipa),z) + sup {C{ipa - (fa), z) 



p 

C{Ar)+ sup {C{ipa- <fa),z) 



^ + sup \\C{Lpa - V^a)\\ciA)\\z\\*c(A) 

agDp 

<iizii* I ^^^^ ii~r 



Rearranging both sides of this inequality gives 
with Cr = 1 — J(i^p) 1 completing the proof. 

D Optimality conditions for DAST 

The following two important inequalities were proven in [1^ Section 2]: 

Theorem D.l Let Q C M" be an arbitrary set of atoms. Suppose that we observe y = Xi, + uj 
where uj ~ A/'(0,(T^/). Let x denote the optimal solution of 

minimizex\\\x — y\\'^ + /u||x||q 

with /i > II^IIq. Then we have 

\\x - Xi,\\\ < 2/x||x^||q (D.l) 
II^^IIq < II^'^IIq + - x*) . (D-2) 

We will use these inequalities in the following proof 
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E Proof of Theorem 



5.1 



To upper bound the norm, let us use some properties of functions which admit atomic decom- 
positions. Note that for any function, H{z) = ^^^^oCw^wi^) with = X^^gjj \cw\, we have, 
for any zi, Z2 £ §, 



\Hizi] 



\H{Z2)\' = {\H{z,)\ - \HizMHiz^)\ + \Hiz2)\) 



<2\H{z,)-H{z2)\\\H\\n^ 



< 



2 ^ \Cw\\fw{zi) 



1 + p 
1-p 

1 + p 



(E.la) 
(E.lb) 



Here, (E.la) is the triangle inequality and (E.lb) uses Lemmas A.l and A. 2 

_ 2^ ^ Then we can bound the norm of A as 



Let A = - G and 



1 

2^ 



2tt 



2tt 



n—l 

E 

k=0 



n-1 



'fc+i 



< 



k=0 
^ n—l 



^ A(e^^*)|2 + 4i^||A||3,|6' - Okl 1 d9 
1 — p 



40k 



2 , 4^1 + P| 



+ 



k=0 



n 1 



Al|2 



The inequality here follows from our preceding argument. 

Now, in this expression, we need to both upper bound the size of A on the measured frequencies 
and its atomic norm. We will bound the latter in terms of the former: 



\^\\a < l|G^IU + IIG'IU 

= IIG^IU + \mG)\\c{A,) 

< IIG.IU + \\C{G,)\\ciA,) + p-\uj, C{G - GO) 

< IIG.IU + (1 - 6)-'\\CiG.)\\ciA) + f^'H^, - G.)) 



< 



< 



2-6 

1- 6^ 

2- 6 
l-j' 



G^|U + //-i(u;,£(G-GO) 



^ n—l 



1/2 



G.\\A + p-'M\2 J]|A(e^' 



iek\\2 



(E.2a) 
(E.2b) 
(E.2c) 
(E.2d) 

(E.2e) 
(E.2f) 



\k=l 



(E.2a) is the triangle inequality. (E.2b) follows from how we defined G. (E.2c) follows from 



Theorem 



D.l 



(E.2d) follows from Proposition 4.1, (E.2e) follows because ||£(ff)||£(_4) < for 
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all linear maps L and transfer functions H. ( E.2f| ) is Holder's inequality. Note that the quantity 

could be infinite if the set : a G ©p'^'*} does not span M". This is precisely 

what leads to us including this assumption in the theorem statement. 



To bound the size of A on the frequency grid, we use Theorem D.l 



n-l 



fe=l 



Let ijj ~ M{0,a'^ln)- Using the well known upper bound for maximum of Gaussian variables 
(see, for example [H]), we have 



m\u}\ 



E 



sup {C{Lpa),Uj) 



<a \ sup \\C{va)\\2 V21og 



, a6l 



Now, ||£((/9a)||2 < 1 1 '/'a 1 1 Woo ^ 2-y/n. Moreover, by a simple volume argument, iDp*^^] < ^a^^^^^^a^a • 
To see this, suppose 5 is a maximal set of points on Dp which are separated by at least r. The 



maximal size of such a set is at most Moreover, |5| > 
particular, note that we have E[||a;||^j._^ ^] 



. Now set r 



Now we can put all of the ingredients together. 



7r(l-p)(5 
16p 



. In 



iTT\\0j\\l 1 + p\ 1 

fi'^ 1 — pj n 



IGvr 1 + p 



k=l 



< 1 + 27r 



< 59 




*\\A 



r i|2 



+ 



|2 
1^ 



{I- p)5)V n{l-5Y n{l-5) 



as desired. 

Applying the inequality < ;^||rG'^||i completes the proof. 
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